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■ Abstract 

^ . This paper deals with a situation of some importance for the analysis of experimental data via 

Neural Network (NN) or similar devices: Let N data be given, such that N = Ng + Nf,, where 
Ns is the number of signals, Nh the number of background events, both unknown. Assume 
that a NN has been trained, such that it will tag signals with efficiency F^, (0 < < 1) 
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. and background data with Fb, {0 < Ft, < 1). Applying the NN yields A^ tagged events. 

. We demonstrate that the knowledge of A^ is sufficient to calculate confidence bounds for 

p . the signal likelihood, which have the same statistical interpretation as the Clopper-Pearson 

(D : 

^ . bounds for the well-studied case of direct signal observation. 
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Subsequently, we discuss rigorous bounds for the a-posteriori distribution function of the 
signal probability, as well as for the (closely related) likelihood that there are Ns signals in 
the data. We compare them with results obtained by starting off with a maximum entropy 
type assumption for the a-priori likelihood that there are A^ signals in the data and applying 
the Bayesian theorem. Difficulties are encountered with the latter method. 
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1 Introduction 



Let us assume that A^^ signals are observed in data 

N = Ns + m, 

where Nb is the number of background events. We denote the a-priori unknown signal 
likelihood by p. Relying on the binomial distribution, Clopper and Pearson [|l[] derived a 
method, which allows to calculate rigorous confidence bounds on p, given A^,, and A^. Now, 
in modern physics, in particular high energy physics experiments, it happens quite often 
that signal and background events belong to overlapping probability densities in a multi- 
dimensional parameter space. In such situations signals can only be identified in a statistical 
sense. Typically, some method may allow to tag signals and background events with different 
efficiencies: Fg for the signals (0 < F,, < 1) and for the background events (0 < < 1). 
Instead of observing A''^ signals we only get 

A^^ tagged data. 

The question is, what confidence limits on the signal likelihood are then implied? We proof, 
and illustrate in some detail, that the Clopper-Pearson method can be generalized accord- 
ingly. 

In particular, we have high energy physics experimental data in mind, where tagging 
may be provided by traditional cuts or by applying some NN |, ^, ^ |^ technique. To 
give an example, figure 5 of Ref.p depicts values for neural network efficiencies, Fs{Y) and 
Fh{Y), which one may expect to occur for identifying tt-events in the All- Jets channel ||^. 
Running the network on all A^ data assigns to each event a value of the network function 
Yn, n = 1,...,N. For a fixed choice of Y, the network returns A^-*^ events with Yn < Y. 
An additional problem in real applications may be that the efficiencies Fs and Fb are not 
exact either. However, as outlined in the conclusions, we think that this difficulty may be 
overcome by the bootstrap approach ^j. 
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In the next section we explain and generalize the Clopper-Pearson approach. A number 
of illustrations focus on the small number of = 10 data, because then the statistical 
meaning of the confidence bounds becomes most transparent. In section 3 we deal with the 
limit of a small number of signals hidden in a large data set. Two instructive sets of network 
efficiencies are chosen, to demonstrate how the general equations are expected to work in 
practice. 

Section 4 considers a-posteriori distribution functions, (i) For the signal probability 



where p{p) is the probability density of p. (ii) For the likelihood that there are Ns signals in 
the data set 



where P{Ns) is the probabihty that there are Ng signals in the data data. Rigorous lower 
and upper bounds are provided. For the examples of section 3 those bounds are close to- 
gether, such that useful approximations of the true a-posteriori distribution functions result. 
In section 5 these results are compared with constructing the F{Ns) distribution function 
and its P{Ns) probability density with the Bayesian method under the maximum entropy 
assumption that each Ns is, a-priori, equally likely. One of the obtained results, and hence 
its a-priori assumption, is in violation to an exact bound. Conclusions follow in the final 
section 6. 

2 From Neural Network Output to Confidence Limits 

Let p be the (unknown) exact likelihood that a data point is a signal. The probability to 
observe Ns signals within N measurements is given by the binomial probability density 




k=0 




(1) 
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We are faced with the inverse problem: if Ns signals are observed, what is the confidence to 
rule out certain Assume that probabilities p'L and are given. Clopper and Pearson |T| 
define corresponding lower p_ and upper bounds as solutions of the equations 

N Ns 

p'L= Kk\N,P-) and pi = Y.b{k\N,p+) (2) 

k=Ns k=0 

with the additional convention p_ = for A^^ = and p+ = 1 for Ng = N. Figure 1 
illustrates, how p- and p+ are obtained as parameters of the binomial distributions which 
yield the areas p'L and p^ as indicated. For this figure we have chosen = 26k and A^, = 130, 
in the ballpark of values which will interest us in the next section. Here and in the following 
binomial coefficients have been calculated relying on Fortran routines of [^]. 

The precise meaning of the bounds is as follows: p_ is the largest number such that 
(for every feasible p) the probability for p < p_ is less than pt- Correspondingly, p^ is the 
smallest number such that the probability for p > p+ is less than p^. The other way round, 

p>p^ with likelihood Pf (p) > (1 - pi) (@a) 

and 

p<p+ with likelihood > (1 - p+) . (|&) 

Therefore, for p_ < we find 

p e [p^,p+] with likelihood P^ip) = P_! + P+ - 1 > (1 - - p+) . (|c) 

It is instructive to illustrate these equations for a small value of A^. Choosing A^ = 10 and 
p'L = p1 = 0.159, the precise p-dependence of (||b) and of P'^ (||c) is depicted in figure ||. 
The equality P^ip) = I — p^ is assumed at the discrete values p = p+(A's), Ns = 0, 1, A^. 
For example, as long as p < p+{0) holds, p certainty will be smaller than any p^ bound. 
As p passes through p+(0), the probability P+{p) jumps down to the value 1 — p^ = 0.841. 
Subsequently PL^ip) rises with p in the range p+(0) < p < p^{\) until, at p = j9_|_(l), the next 
jump occurs, and so on. The corresponding graph for PL{p) follows from P^ip) by refiection 
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on the p = 0.5 axis. The lower, full curve of figure |^ is obtained by combining both according 
to eqn.(0c). 

We are interested in the more involved situation where signal and background can no 
longer be distinguished unambiguously. Instead, a neural network or similar device yields 
statistical information by tagging signals with efficiency Fs and background data with effi- 
ciency Fh {0 < Fb < 1, < Fs < 1 and, typically, Fb <^ Fs). Applying the network to all 
N data results in A^"^, (0 < A^"^ < A^) tagged data, composed of A^^ = Nj + , where Nj 
are the tagged signals and A"^ are the tagged background data. Of course, the values for 
and A"^ are not known. Our task is to determine confidence levels for the signal likelihood 
p from the sole knowledge of A^^. We proceed by writing down the probability density of 
jV^ for given p and, subsequently, generalizing the Clopper-Pearson method. 

First, assume fixed A^^- The probability densities of Nj and A^^ are binomial and thus 
the probability density for A^^ is given by the convolution 

P(Ar^|A^,)= J2 b{Nj\Ns,Fs)b{Nl\Nb,Fb), Nb = N-Ns. (3) 

N}'+N^'=NY 

Summing over A^^ removes the constraint and the A^^-probability density, with A^, p fixed, 
is 

N 

PiN''\N,p) = KNs\N,p)PiN^\Ns) . (4) 

Ns=0 

For given pt, p"^ and A^^, we define confidence limits p_ and p+ in analogy with equation (0) 

N 

p1= ^ P(k\N,p.) and pl = Y.Pik\N,p+) . (5) 

k=NY k=0 

Their meaning is as already outlined by equations (|^-^). Choosing Fs = 0.9, Fb = 0.2 
and the other parameters as before, figure 3 illustrates these equations for the new situation. 
The interpretation is as for figure |^ with two remarkable exceptions: 

(i) It may happen that eqn.(|^) has no solutions p_(A^^) for certain A^^ = A^, A^ — 1, ... or 
no solutions p^{N^) for certain A^^ = 0, 1, ... . The reason is that, due to the NN, 
the result is sufficiently unlikely for all p. One may then either decrease pt or p'^ or 
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discard the entire analysis. The parameter values of figure 3 are chosen such that there 
is no solution p+(0). Consequently, there is no longer a range of small p- values with 
P+ip) = 1- Such exotic NN output (here = 0) is by definition rare. 

(ii) For Fs + Fb ^ 1, the function Pl{p) is no longer a refiection of P^{p). This is shown 
in figure 3, where P'^{p) turns out to be no longer symmetric. In fact, on the r.h.s of 
figure 3 we observe the same feature as in figure 2: The upper and lower curves agree 
due to P-{p) = 1 in this range. 

In summary, the bounds obtained with pi = p']_ = 0.159 guarantee the standard 

one error bar confidence probability of 68.2% for every single p- value and for almost all p 
the actual confidence will be better. However, the one-sided bounds cannot be improved 
without violating the requested confidence probability for some p-values. In the same way, 
bounds calculated with pi = p'^ = 0.023 ensure the standard two error bar confidence level 
of 95.4% or better, and so on. It should be noted that, for p 7^ and p ^ I, the deviations 
from the requested confidence probabilities tend to decrease in the limit of large statistics. 

3 Large Data Sets with Few Signals 

We now assume the values of figure 1 to demonstrate the approach in a limit which is of 
particular interest for experimental high energy physics applications. With = 26k and 
Ns = 130 one gets the Clopper-Pearson confidence limits 

0.00456 <p< 0.00547 for pi = = 0.159 , 

0.00416 <p< 0.00595 for pi = pi = 0.023 . 

Next, we assume that the only information about the signals is provided by some NN output, 
where we use two sets of efficiencies, inspired by 0. 

First, we consider Fg = 0.5 and Fh = 0.005. Figure 4 depicts the tag probability density 
P(A^^|A^s); see @, for three different values of A^^: 0, 130 and 260. There is almost no overlap 
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and, consequently, we expect that clear identification of a positive signal can be achieved. For 
Ns = 130 the central A^^ values are located around NsFs+NbFb = 130 F, + (26000-130) Fb = 
194.35. Using A^^ = 194, iteration of equation (||) yields the confidence limits 

0.00389 <p< 0.00613 for = = 0.159 , 

0.00289 <p< 0.00729 for pi = = 0.023 . 

The computational demand for these results was less than two hours of CPU time on a DEC 
3000 Alpha 600 workstation, where it is important to store frequently used coefficients in 
RAM. 

Let us reduce the signal efficiency to Fg = 0.1 and keep the background (in)efficiency 
unchanged. Figure 5 depicts the new tag probability densities. We find considerable overlap 
and expect that p = can no longer be excluded. The central A^^ values are now located 
around 142.35. Using A^^ = 142, iteration of equation (|^) gives 

0.00005 <p< 0.010 for pi = pi = 0.159 , 

0.00000 <p< 0.0153 for pi = pi = 0.023 . 

The latter case should be supplemented by the explicit probability for p = 0, estimated in 
the next section. 



4 Signal Probability Distributions 

Equation (|^), or of course (0) when applicable, can be used to sandwich the a-posteriori signal 
probability distribution F{p) = F{p\N, N^^) between lower and upper bounds. Namely, it is 
easy to see that 

TV N 

F,{p) = l-plip)= P{k\N,p) < F{p) < F2{p)=p1{p)= E P{k\N,p). (6) 
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For their numerical evaluation the sums should be re-written as F2 = 1 — J2k=<r^ ^i^l-^^P) 
and Fi = F2 — P{N^\N,p). Figure 6 depicts these functions for the previously discussed 
examples A^^ = 142 and 194. Upper and lower bounds are seen close together, such that 
F{p) = {Fi{p) +F2{p))/2 would be a reasonable working approximation. The corresponding 
probability densities are the derivatives with respect to p. Their numerical calculation is 
straightforward when analytical expressions for the derivatives of the binomial coefficients 
in equation (|) are used. Figure 7 exhibits the results, Pi{p\N ,N^) and P2{p\N^N'^). At 
p = the probability densities have 5-function contributions 

p.(p) = Fi(0)5(j9) + ..., (i = l,2) with Fi(0) = 0.136 and ^2(0) = 0.156 . 

In addition, or alternatively to the outlined approach, one may be interested to find for 
Ng = 0,1, ...,N the probabilities that there are Ns signals in the data. That could be done 
using the probability densities Pi{p\N, N^), {i = 1,2), but a calculation starting off from 
P(A^^|A^s); equation (|^), is far more direct. In particular, it may sometimes be of advantage 
that Ns, in contrast top, is a discrete variable. Let us now denote the probability distribution 
for signals by F{Ns) = F^NslN^^). Lower and upper bounds are 

N N 

FiiNs)= E Pik\Ns) < F{Ns) < F2{Ns)= ^ Pik\Ns) . (7) 

k=NY+l k=N"^ 

Despite of using the same symbols F, Fi and F2, the functions in equation and (|^) are, of 
course, different. By definition, F2{Ns) is the likelihood that Ns signals could have produced 
the observed A^^ or a greater one. Therefore, the likelihood that either of = 0, 1, A^^ is 
correct is less or equal the value F2{Ns), i.e. F2{Ns) is an upper bound of the a-posteriori 
distribution function F{Ns). Similarly, 1 — Fi{Ns) is an upper bound on the likelihood 
that either of = Ns,Ns + 1,...,A^ is correct. Consequently, Fi{Ns) is a lower bound of 
F{Ns). Figure 8 depicts theses bounds for our standard examples A^^ = 142 and 194. The 
similarity with figure 6 is no coincidence, as p determines A^^ up to fluctuations of order 
1/\/N. Probability densities Pi{Ns\N^) are defined by 

Ns 

F,{Ns\N'') = Y.P,ik\N^), (z = l,2). 

k=0 
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Defining Fi{—1) = 0, they follow recursively 

P^{N,\N^) = F,{Ns)-F,{Ns-l), (iV, = 0,l,...,iV). (8) 

Figure 9 exhibits the results. Once the probability densities Pi{Ns\N'^), Y1ns=o Pii^slN^^) = 
1 are known, confidence limits can also be calculated from the subsequent generalization of 
the Clopper-Pearson @ approach: 

N N N Ns 

f_ = Y: i^2(iVs|iV^) E b{k\N,P-), PI=Y. Pi{Ns\N^) E Kk\N,p+) . (9) 

Ns=0 k=Ns Ns=0 k=0 

These equations involve nothing, but weighting the binomial Clopper-Pearson sums with 
the appropriate probabilities P{Ns\N^). They re-produce the bounds (|^) identically, as was 
numerically checked for our examples of section 3. 



5 Bayesian Approach 

Our construction invokes the a-priori known fact that the number of signals is in the range 
< Ns < N. It is popular (for reviews see [|r^, |rT|), and sometimes quite successful, to make 
additional assumptions in form of a-priori likelihoods. This can be motivated by a look at 
figures 2 and 3. For almost all p the confidence is better than the desired 68.2%. If an a- 
priori likelihood is known, the Bayesian approach yields a confidence of precisely 68.2%. The 
debate is about using a-priori likelihoods in situations where they are not known. Reasonable 
guesses can apparently be made in many situations. However, false results are obtained when 
such a guess is in contradiction with the data, which may not always be trivial to uncover. 
An example is given here. 

In our situation, one would be tempted to impose an a-priori likelihood on either p or 



Ng. For instance, invoking the maximum-entropy principle []I2[ leads to constant a-priori 
probability densities 

p\p) = 1 or P\Ns) = for < N, < N . (10) 



For simplicity we focus on the latter case. (Using the result from |]13| it would also be 
straightforward to work out the other one.) As before, is determined by measurements 
and NN analysis. Under the assumption (0) for P^{Ng), the Bayesian theorem implies the 
a-posteriori probability 

P{Ns\N^) = const. P{N^\Ns) , (11) 

with P(A^^|A^s) given by equation (|^) and the constant follows from the normalization 
J2ns=o ^i^^^l^"^) ~ ^- ^^^^ °f ~ result agrees very well with that depicted in 

figure 9. However, this is not true for A^^ = 142, see figure 10, where the probability density 
of figure 9 is compared with the Bayesian result. 

Whereas for strong signal identification the difference with our approach is practically 
negligible, it is significant for weak (or no) signal identification. The Bayesian probability 
for p = is only 0.00215, implying that the Bayesian distribution function violates the rig- 
orous Fi{Ns) bound. The reason is obvious: It is not clear what a-priori probability one 
should assign to the situation that there is no effect at all. That is why A^^ = does not 
compete on the same level as the numbers A^, > 1- For one of the the situations, we have 
in mind, no effect at all would mean that there is no top quark. One could assign a finite 
a-priori probability to this possibility, but whether this is 10%, 50% or 90% would be highly 
arbitrary. Actually it does not even work: As the top quark has already be found |]^, one 
may argue in favor of the given a-priori likelihood with the argument that the A^, = is 
certainly very small. However, this leads to an overestimation of large signal probabilities, 
as the a-posteriori Ng = likelihood becomes incorrectly re-distributed. 

6 Conclusions 

We have calculated confidence limits of an unknown signal likelihood for the situation where 
few signals occur in a large number of events. The only input used were neural network 
efficiencies for tagging signal and background events as well as the number of data the 
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network selects. The extension of our approach from the binomial to the multinomial case, 
i. e. to more than two different types of data (signal and background) is certainly possible. 

In typical applications the efficiencies Fg and may not be known exactly either. Instead, 
a number of training sets (j = 1, J) may exist, each giving somewhat different efficiencies 
F^, F^. We think that in this situation a bootstrap type of approach p| can be applied and 
that the probability density (|^) provides a suitable starting point. We can linearly combine 
different probability densities to an ultimate one 

P,(iV,) = J-if: if (iVs|iVf),(^ = 1,2), 
i=i 

and proceed with Pi{Ns) as discussed in section 4. 

Finally, to involve conjectured a-priori likelihoods may in many situations be unavoidable 
and, actually, be quite successful. In our case: When a clear, positive signal identification 
is possible, we find practically no difference between a Bayesian maximum entropy and our 
approach. However, our example of weak signal identification shows that a-priori likelihoods 
are better avoided when a rigorous alternative exists. 
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Figure 1: Binomial probability densities corresponding to solving equations (H) for p+ and 
p_ with p'i=p\ = 0.159, N = 26k and iV, = 130. 
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Figure 2: Confidence likelihoods for the Clopper- Pearson bounds (^. The parameters N = 
10 and pi = p+ = 0.159 are used. Upper, broken line: Confidence likelihood P = P1{p) (@b) 
versus the true signal probability p. Lower, full line: Confidence likelihood P = P^{p) (^) 
versus p. 
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Figure 3: Confidence likelihoods for the generalized Clopper- Pearson bounds (^). The pa- 
rameters Fg = 0.9, Fh = 0.2 and those of figure ^ are used. Upper, broken line: Confidence 
likelihood P = P^ip) (0b) versus the true signal probability p. Lower, full line: Confidence 
likelihood P = P'^{p) (J^) versus p. 
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Figure 4: Probability densities to get A^^ data from the NN employing the efficiencies 
Ff, = 0.005, Fg = 0.5 and assuming Ng signals in the original 26k data set. 
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Figure 7: A-posteriori signal probability densities (corresponding to upper and lower bounds 
of the distribution functions in figure 6) . 
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Figure 8: A-posteriori distributions for the number of signals (upper and lower bounds) . 
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Figure 9: A-posteriori probability densities for the number of signals (corresponding to upper 
and lower bounds of the distribution functions in figure 8). 
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Figure 10: — 142: Comparison of the Bayesian (maximum entropy) a-posteriori proba- 
bility density P{Ns) with Pi{N,) and P2{N,) of figure 9. 
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